Maintaining a healthy and consistent diet makes for our most basic and important needs. Access to food sources, however, may be limited for many people, leading to issues of food insecurity. In 2010, 14.5% (17.2 million households) in the U.S. were food insecure, 5.4% of which had very low food security [1].
“Where’s my next meal going to come from?”
A potential factor that plays a role in determining whether households have a good and consistent supply of nutritious food may be their accessibility to food. Food accessibility is associated with the proximity to grocery stores, food markets, and other food sources. This project looks at the scope of food accessibility for households in different counties. The project also covers the effect of food environment factors such as income, population demographics, and the availability of nutrition programs on food accessibility and food insecurity.
Data
The project uses the Food Environment Atlas Data from 9/10/2020 (Economic Research Service, n.d.). The dataset has been compiled by the US Department of Agriculture (USDA) Economic Research Service (ERS) for the purposes of studying factors that affect food choices and the accessibility of healthy foods in communities. The information in the dataset had been aggregated from various reports from the USDA, the Bureau of the Census, the U.S. Department of Commerce, and more for the years 2006 to 2019. Food accesibility population data, which is the focus of the project, were given for 2010 and 2015.
The dataset contains county-level information on food environment factors such as access to grocery stores/supermarkets/restaurants, local food sales, food prices, food assistance programs like SNAP (Supplemental Nutrition Assistance Program), National School Lunch Program, etc., socioeconomic characteristics, and some health/physical activities. State-level information on household food insecurity are also given. The dataset contained data on 3143 counties, each uniquely identified with their FIPS code.
To enable geographic visualizations and analysis, an additional file from the ArcGis Hub containing geographic boundary shapes is used.
Data preprocessing
The information in the dataset was stored in separte worksheets, grouped according to the category of the features. Since there was a large number of featurs, the first step was to perform feature reduction. This was done by manually selecting the features that were related to the project’s questions and goals. Features that contained very granular information were also omitted in order to simplify analysis as well as to obtain more generalized insights.
The dataset and boundary shape file were also compared to check if county FIPS code and names matched for all observations. From 2010 to 2015, there had been changes to some of the county names and FIPS codes which caused discrepancies between the two datsets. These couties were identified and modified to reflect the changes and match all observations. This brought the number of observations from 3143 to 3142 counties.
For columns with low number of missing values, imputation was done using the median of those features. HTis was done while looking at variable distributions to ensure that imputation does not significantly change their distributions. However, median imputation was not appropriate for some 2015 variables that had larager number of missing values. For these variables, missing values were replaced with numbers from the correponding feature for 2010.
Overall, the data preprocessing involved merging data, selecting relevant variables, and imputing missing values while exploring the nature of the variables. The cleaned dataset was then saved to a new file that is used for analysis.
Insights from Visualization
Using the cleaned dataset, some variables of interest were explored by producing visualizations.
Food accessibility in Counties
To gauge the level of food accessibility in counties, the dataset provides the percentage of population with low food accessibility. For this dataset, low accessibility population are considered to be those “living more than 1 mile from a supermarket or large grocery store if in an urban area, or more than 10 miles from a supermarket or large grocery store if in a rural area” [2].
Code
import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport seaborn as snsdf = pd.read_pickle(r'./../data/AtlasCleaned.pkl')sns.set(style='darkgrid')def plt_dis(c): f = sns.displot(data=df, x=c, height=4, aspect=10/8.27, bins=20) plt.xlim([0, 100]) plt.ylabel("Number of Counties", alpha=0.8) plt.xlabel("Low Food Access Pop. %", alpha=0.8) plt.show()plt_dis('PCT_LACCESS_POP10')
Figure 1: Distribution of County % Population with Low Food Accessibility
The distribution of the variable reflects an interesting characteristic of food accessibility in the U.S. The two distinct modes of the distrubution suggests that counties can fall under two distint groups of low and very low accessibility. For majority of the counties, <40% of the population has low food accessibility, while, for around 130 counties, the entire population has low accessibility to food sources.
Geographic Visualization of Food Accessibility
Code
import geopandas as gpdimport plotly.express as pximport plotly.io as pioimport pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport seaborn as snsdf = pd.read_pickle(r'./../data/AtlasCleaned.pkl')import jsonwithopen(r'./../src/USA_Counties_(Generalized).geojson') as f: tract = json.load(f)def plot_chlpth(df, x, label, header): fig = px.choropleth(df, locations='FIPS', color=x, color_continuous_scale="Viridis", range_color=(0, 100), geojson=tract, featureidkey ="properties.FIPS", scope="usa", hover_data=['State', 'County'], labels=label, title=header) fig.update_layout(legend=dict(orientation="h", yanchor="top",y=1.02,xanchor="right",x=1)) fig.show()plot_chlpth(df, 'PCT_LACCESS_POP10', {'PCT_LACCESS_POP10':'Popn. %'}, None)
Figure 2: Low Food Accessibility Population Percentages in 2010
The chloropeth shows that low accesiblity counties are concentrated in the Midwest, Southwest and West regions of the country. Since food accessbility is defined on the basis of proximity of households to food stores, the location of the low access countries may be dependent on the housing and geographic conditions.
Food Accessibility and Vehicles
Code
def plt_dis(c): f = sns.displot(data=df, x=c, y='PCT_LACCESS_POP10', height=4, aspect=10/8.27, bins=10) plt.xlim([-5, 100]) plt.ylabel("Low Food Access Pop. %", alpha=0.9) plt.xlabel("Household % with Low Food Access & No Vehicle", alpha=0.8) plt.show()plt_dis('PCT_LACCESS_HHNV10')
Figure 3: Low Food Access Population and Household Vehicle Availability
The density plot of household vehicle availability and county food accessibility helps to show the extent of the food insecurity in the country. The blocks farther away from the axes represent the areas where both food accessibility and vehicle availability for the population is low. These counties are likely to have the biggest challenges of food accessibility.
For other counties, although food accessibility is bad, most households are indicated to have access to vehicles that help to lower the access to food stores. For most counties, under 10% of the households have low access to food and no vehicles.
Food Accessibility and Income
Code
sns.set(color_codes=True)f, ax = plt.subplots(figsize=(8, 8))sns.scatterplot(data=df, x='PCT_LACCESS_POP10', y='PCT_LACCESS_LOWI10')ax.set_ylabel("County Population % with Low Access and Low Income", size =12, alpha=0.9)ax.set_xlabel("County Population % with Low Food Access",size =12, alpha=0.9)ax.tick_params(axis='both', which='major', labelsize=14)plt.xlim([0, 102])f.show()
C:\Users\kunse\AppData\Local\Temp\ipykernel_13904\3499898458.py:8: UserWarning:
Matplotlib is currently using module://matplotlib_inline.backend_inline, which is a non-GUI backend, so cannot show the figure.
Figure 4: Low Food Access Population and Household Vehicle Availability
The scatter plot is of the populaitons with low food accessibility and populations with both low income and low food access. The graph again points the presence of two groups of low and very low food access. For counties with bad food accessibility, the proportion of the population having low access and low income is greater and more varied.
Demographic
In counties where >80% of the population has low food accessibility, the majority of the population is white, followed by Hispanic and American Indian / Alaska Native.
Models
regression to model insecurity
Results
Discussion
Conlusion
Source Code
---title: "Food Accessibility in U.S. Counties"author: "Tenzin Sherpa"bibliography: references.bibnumber-sections: falseformat: html: theme: default rendering: default code-fold: true code-tools: true pdf: defaultjupyter: python3---# The IssueMaintaining a healthy and consistent diet makes for our most basic and important needs. Access to food sources, however, may be limited for many people, leading to issues of food insecurity. In 2010, 14.5% (17.2 million households) in the U.S. were food insecure, 5.4% of which had very low food security [1]. <pstyle="text-align: center;">*“Where’s my next meal going to come from?”*</p>A potential factor that plays a role in determining whether households have a good and consistent supply of nutritious food may be their accessibility to food. Food accessibility is associated with the proximity to grocery stores, food markets, and other food sources. This project looks at the scope of food accessibility for households in different counties. The project also covers the effect of food environment factors such as income, population demographics, and the availability of nutrition programs on food accessibility and food insecurity. # Data The project uses the Food Environment Atlas Data from 9/10/2020 (Economic Research Service, n.d.). The dataset has been compiled by the US Department of Agriculture (USDA) Economic Research Service (ERS) for the purposes of studying factors that affect food choices and the accessibility of healthy foods in communities. The information in the dataset had been aggregated from various reports from the USDA, the Bureau of the Census, the U.S. Department of Commerce, and more for the years 2006 to 2019. Food accesibility population data, which is the focus of the project, were given for 2010 and 2015. The dataset contains county-level information on food environment factors such as access to grocerystores/supermarkets/restaurants, local food sales, food prices, food assistance programs likeSNAP (Supplemental Nutrition Assistance Program), National School Lunch Program, etc.,socioeconomic characteristics, and some health/physical activities. State-level information on household food insecurity are also given. The dataset contained data on 3143 counties, each uniquely identified with their FIPS code.To enable geographic visualizations and analysis, an additional file from the ArcGis Hub containing geographic boundary shapes is used.## Data preprocessingThe information in the dataset was stored in separte worksheets, grouped according to the category of the features. Since there was a large number of featurs, the first step was to perform feature reduction. This was done by manually selecting the features that were related to the project's questions and goals. Features that contained very granular information were also omitted in order to simplify analysis as well as to obtain more generalized insights.The dataset and boundary shape file were also compared to check if county FIPS code and names matched for all observations. From 2010 to 2015, there had been changes to some of the county names and FIPS codes which caused discrepancies between the two datsets. These couties were identified and modified to reflect the changes and match all observations. This brought the number of observations from 3143 to 3142 counties. For columns with low number of missing values, imputation was done using the median of those features. HTis was done while looking at variable distributions to ensure that imputation does not significantly change their distributions. However, median imputation was not appropriate for some 2015 variables that had larager number of missing values. For these variables, missing values were replaced with numbers from the correponding feature for 2010. Overall, the data preprocessing involved merging data, selecting relevant variables, and imputing missing values while exploring the nature of the variables. The cleaned dataset was then saved to a new file that is used for analysis.# Insights from VisualizationUsing the cleaned dataset, some variables of interest were explored by producing visualizations. ### Food accessibility in Counties To gauge the level of food accessibility in counties, the dataset provides the percentage of population with low food accessibility. For this dataset, low accessibility population are considered to be those "living more than 1 mile from a supermarket or large grocery store if in an urban area, or more than 10 miles from a supermarket or large grocery store if in a rural area" [2]. ```{python}#| label: fig-accessdist10#| fig-cap: "Distribution of County % Population with Low Food Accessibility"import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport seaborn as snsdf = pd.read_pickle(r'./../data/AtlasCleaned.pkl')sns.set(style='darkgrid')def plt_dis(c): f = sns.displot(data=df, x=c, height=4, aspect=10/8.27, bins=20) plt.xlim([0, 100]) plt.ylabel("Number of Counties", alpha=0.8) plt.xlabel("Low Food Access Pop. %", alpha=0.8) plt.show()plt_dis('PCT_LACCESS_POP10')```The distribution of the variable reflects an interesting characteristic of food accessibility in the U.S. The two distinct modes of the distrubution suggests that counties can fall under two distint groups of low and very low accessibility. For majority of the counties, <40% of the population has low food accessibility, while, for around 130 counties, the entire population has low accessibility to food sources.**Geographic Visualization of Food Accessibility**```{python}#| label: fig-map10#| fig-cap: "Low Food Accessibility Population Percentages in 2010"import geopandas as gpdimport plotly.express as pximport plotly.io as pioimport pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport seaborn as snsdf = pd.read_pickle(r'./../data/AtlasCleaned.pkl')import jsonwithopen(r'./../src/USA_Counties_(Generalized).geojson') as f: tract = json.load(f)def plot_chlpth(df, x, label, header): fig = px.choropleth(df, locations='FIPS', color=x, color_continuous_scale="Viridis", range_color=(0, 100), geojson=tract, featureidkey ="properties.FIPS", scope="usa", hover_data=['State', 'County'], labels=label, title=header) fig.update_layout(legend=dict(orientation="h", yanchor="top",y=1.02,xanchor="right",x=1)) fig.show()plot_chlpth(df, 'PCT_LACCESS_POP10', {'PCT_LACCESS_POP10':'Popn. %'}, None)```The chloropeth shows that low accesiblity counties are concentrated in the Midwest, Southwest and West regions of the country. Since food accessbility is defined on the basis of proximity of households to food stores, the location of the low access countries may be dependent on the housing and geographic conditions. ### Food Accessibility and Vehicles ```{python}#| label: fig-car10#| fig-cap: "Low Food Access Population and Household Vehicle Availability"def plt_dis(c): f = sns.displot(data=df, x=c, y='PCT_LACCESS_POP10', height=4, aspect=10/8.27, bins=10) plt.xlim([-5, 100]) plt.ylabel("Low Food Access Pop. %", alpha=0.9) plt.xlabel("Household % with Low Food Access & No Vehicle", alpha=0.8) plt.show()plt_dis('PCT_LACCESS_HHNV10')```The density plot of household vehicle availability and county food accessibility helps to show the extent of the food insecurity in the country. The blocks farther away from the axes represent the areas where both food accessibility and vehicle availability for the population is low. These counties are likely to have the biggest challenges of food accessibility. For other counties, although food accessibility is bad, most households are indicated to have access to vehicles that help to lower the access to food stores. For most counties, under 10% of the households have low access to food and no vehicles.### Food Accessibility and Income ```{python}#| label: fig-income10#| fig-cap: "Low Food Access Population and Household Vehicle Availability"sns.set(color_codes=True)f, ax = plt.subplots(figsize=(8, 8))sns.scatterplot(data=df, x='PCT_LACCESS_POP10', y='PCT_LACCESS_LOWI10')ax.set_ylabel("County Population % with Low Access and Low Income", size =12, alpha=0.9)ax.set_xlabel("County Population % with Low Food Access",size =12, alpha=0.9)ax.tick_params(axis='both', which='major', labelsize=14)plt.xlim([0, 102])f.show()```The scatter plot is of the populaitons with low food accessibility and populations with both low income and low food access. The graph again points the presence of two groups of low and very low food access. For counties with bad food accessibility, the proportion of the population having low access and low income is greater and more varied.### DemographicIn counties where >80% of the population has low food accessibility, the majority of the population is white, followed by Hispanic and American Indian / Alaska Native.# Models regression to model insecurity # Results# Discussion # Conlusion